A Deterministic Syntactic-Semantic Parser
نویسندگان
چکیده
We consider t ha t we have made a dec is ive step towards determinism in pa rs ing . We agree w i th Winograd 's h e s i t a t i o n to evaluate the determinism hypothes i s as formulated by Marcus. However, t h i s does not make us doubt about the p o s s i b i l i t y of de te rmi nism; on the con t ra ry , we examined not only how to improve over Marcus, but also the h i s t o r i c a l reasons of the non-deteminism of most systems. Our improvements are based on two p r i n c i p l e s : syntac t ic -semant ic i n t e g r a t i o n , and quas i s imu l t a ,ieousness. The f i r s t means t h a t there is no such t h i n g as " the autonomy of syntax" (Marcus); so, we agree w i th Schank and, f u r t h e r , we showed tha t local semantic ambigu i t ies could be solved d e t e r m i n i s t i c a l l y (Marcus (ch.10) claims t ha t these ambigu i t ies need p a r a l l e l p rocess ing) . The second permits the processing of s t ruc tu res too d i f f i c u l t f o r PARSIFAL e .g . l o c a l l y ambiguous PP attachments. Deta i led examples support our proposals. I POSSIBLE FACTORS OF NON-DETERMINISM With the notable except ion of Marcus, already mentioned , most "parser makers" seem to agree on having a non-de te rm in is t i c s t ra tegy (backt rack ing or p a r a l l e l i s m ) . Schank (e t a l . ) have always perceived the inadequacy of those s t r a t e g i e s , and practised the i n t e g r a t i o n of syntax and semantics, but they have conserved a top-down approach ( ' p r e d i c t i o n ' cf Wilks) We shall t ry to show the importance of 4 factors of nondeterminism which are not related to NL ambiguity: the architecture of systems, ru les and i n t e r p r e t e r , stack mecanism, word by word processing. A. A rch i tec tu re of systems Decision elements f o r a parser are of 4 k inds : morphosyntact ic, semantic, pragmatic and con tex tua l . So, i t is obvious tha t cons ider ing only one of them br ings about non-determinism, independently of NL. i t s e l f . However, even when a system uses a l l of t h i s i n f o r m a t i o n , non-determinism may appear due to an a r t i f i c i a l separat ion between modules (eg syn tac t i c & semant ic) . Some non-determinism remains even in the most sophisticated modular systems and t h a t ' s because: -one module is dominating the others (as in SHRDLU) t o o many i n t e r a c t i o n po in ts e x i s t t h e non-det . may f i n d refuge in one of the modules (as in PARSIFAL, the semantic one). B. Representat ion f o r a grammar and i n t e r p r e t o r The i n t e r p r e t e r has usua l l y e i t h e r a top-down or a bottom-up s t ra tegy , but most f requen t l y top-down. But a r e l a t i v e c lause, f o r ins tance, should ra ther be parsed bottom-up, in order to be attached de te r m i n i s t i c a l l y . In theory , the Marcus parser seems to permi t t h i s s t ra tegy ( " a t t e n t i o n s h i f t i n g r u l e s " ) . But the grammar Marcus a c t u a l l y wrote does not take advantage of t h i s f a c i l i t y \nd r e l a t i v e clauses are a lso parsed top-down. From h is t h e o r e t i c a l work , we s t i l l have the con f i rmat ion o f f o l l o w i n g p o i n t : A completely top-down or completely bottom-up mecanism must parse non d e t e r m i n i s t i c a l l y some sentences tha t would not g ive the same t roub le to a mixed s t ra tegy . ( I n the f i e l d of programming languages, such a mixed s t ra tegy has already been app l i ed , Ear ley) Now, another important p o i n t : in a t r a d i t i o n a l dec la ra t i ve system (ATN or PSG), the ru les are independent of the i n t e r p r e t e r . So, when a c o n f l i c t arises between two v a l i d r u l e s , i t i s not poss ib le f o r the i n t e r p r e t e r to make a dec i s i on . Hence,one has to l i m i t the r o l e of the i n t e r p r e t e r and the ru les must be non ambiguous ( t h e i r cond i t i on par t being as det a i l e d as p o s s i b l e ) .
منابع مشابه
برچسبزنی نقش معنایی جملات فارسی با رویکرد یادگیری مبتنی بر حافظه
Abstract Extracting semantic roles is one of the major steps in representing text meaning. It refers to finding the semantic relations between a predicate and syntactic constituents in a sentence. In this paper we present a semantic role labeling system for Persian, using memory-based learning model and standard features. Our proposed system implements a two-phase architecture to first identify...
متن کاملFeature Engineering in Persian Dependency Parser
Dependency parser is one of the most important fundamental tools in the natural language processing, which extracts structure of sentences and determines the relations between words based on the dependency grammar. The dependency parser is proper for free order languages, such as Persian. In this paper, data-driven dependency parser has been developed with the help of phrase-structure parser fo...
متن کاملبرچسبزنی خودکار نقشهای معنایی در جملات فارسی به کمک درختهای وابستگی
Automatic identification of words with semantic roles (such as Agent, Patient, Source, etc.) in sentences and attaching correct semantic roles to them, may lead to improvement in many natural language processing tasks including information extraction, question answering, text summarization and machine translation. Semantic role labeling systems usually take advantage of syntactic parsing and th...
متن کاملOn GB Parsing and Semantic Interpretation
The paper shows how sentences containing scope ambiguities can be assigned syntactic and semantic structures by means of sloppy deterministic processing techniques only. The semantic framework is Discourse Representation Theory, and the sloppy deterministic parser is described in Nordgård (1993). Of primary concern for the article is the transition from syntactic structures to discourse represe...
متن کاملAdding Semantic and Syntactic Predicates To LL(k): pred-LL(k)
Most language translation problems can be solved with existing LALR(1) or LL(k) language tools; e.g., YACC Joh78] or ANTLR PDC92]. However, there are language constructs that defy almost all parsing strategy commonly in use. Some of these constructs cannot be parsed without semantics, such as symbol table information, and some cannot be properly recognized without rst examining the entire const...
متن کاملStudying impressive parameters on the performance of Persian probabilistic context free grammar parser
In linguistics, a tree bank is a parsed text corpus that annotates syntactic or semantic sentence structure. The exploitation of tree bank data has been important ever since the first large-scale tree bank, The Penn Treebank, was published. However, although originating in computational linguistics, the value of tree bank is becoming more widely appreciated in linguistics research as a whole. F...
متن کامل